… now we can begin exploring!
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.466
Model: OLS Adj. R-squared: 0.460
Method: Least Squares F-statistic: 74.32
Date: Thu, 15 Jun 2023 Prob (F-statistic): 3.16e-13
Time: 14:14:59 Log-Likelihood: -108.49
No. Observations: 87 AIC: 221.0
Df Residuals: 85 BIC: 225.9
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 1.0480 0.226 4.630 0.000 0.598 1.498
x1 0.0989 0.011 8.621 0.000 0.076 0.122
==============================================================================
Omnibus: 7.712 Durbin-Watson: 2.161
Prob(Omnibus): 0.021 Jarque-Bera (JB): 14.664
Skew: -0.127 Prob(JB): 0.000654
Kurtosis: 4.995 Cond. No. 49.0
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Really, we’re just fitting a line
\(y_i = \beta_0 + \kappa T_i + \beta_1 X_{1i} + ... +\beta_k X_{ki} + u_i\)
But that line can get super, super squiggly
Machine learning is just using compute to make the best multidimensional squiggle
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb..predict(X_test)avarotsis@no10.gov.uk
Andreas Varotsis_10DS @ GovDataScience Slack
andreasthinks@twitter.com
andreasthinks@fosstodon.org
Andreas Varotsis @ Kaggle